38 research outputs found

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    Get PDF
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional biostatistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant variability in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this article proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow makes a compromise between (1) genericity of applications (e.g. usable on small or big data, on continuous, categorical or mixed variables, on database of high-dimensionality or not), (2) ease of implementation (need for few packages, few algorithms, few parameters, ...), and (3) robustness (e.g. use of proven algorithms and robust packages, evaluation of the stability of clusters, management of noise and multicollinearity). This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. It can be useful both for data scientists with little experience in the field to make data clustering easier and more robust, and for more experienced data scientists who are looking for a straightforward and reliable solution to routinely perform preliminary data mining. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    Etude des interactions entre le Médiateur et les facteurs généraux de la transcription par l'ARN polymérase II.

    Get PDF
    In eukaryotes, RNA polymerase II (Pol II) is responsible for the transcription of coding genes and a large number of non-coding RNAs. The first step in transcription activation is the recognition of DNA motifs by activators that trigger the recruitment of co-activators, general transcription factors (GTFs) and ultimately Pol II to form a preinitiation complex (PIC). During my thesis we focused our study on the Mediator, a multiprotein complexe that plays a critical role in this process. We found different interactions between the Mediator and transcription factors that inform how it could influence the transcription initiation. First, we discovered a genetic interaction between MED31 that encodes the most conserved Mediator subunit and DST1 that encode the elongation factor TFIIS. Surprisingly, we revealed a new role for TFIIS which acts in conjunction with Mediator during transcription initiation to recruit PolII on promoter. Then, we identified a direct interaction between Med11 head Mediator subunit and Rad3 TFIIH subunit. We explored the significance of this interaction and those of Med11 with Med17 and Med22 head module subunits and found that impairing these interactions could differentially affect the recruitment of TFIIH, TFIIE and Pol II in PIC or destabilize the association of TFIIH modules. We also found that a med11 mutation that altered promoter occupancy by TFIIK kinase module of TFIIH reduced Pol II CTD serine 5 phosphorylation. We conclude that the Mediator head module plays a critical role in TFIIH, TFIIE and PolII recruitment in PIC. Altogether, these results suggest a branched assembly pathway in PIC formation.L'ARN polymérase II sert à la transcription de petits ARN non codants et à la transcription des ARN messagers. La première étape de l'activation de la transcription est la reconnaissance de régions de l'ADN par les activateurs spécifiques. Ceci permet le recrutement de coactivateurs, des facteurs généraux (GTF) et de Pol II pour former le complexe de préinitiation (PIC). Au cours de ma thèse, nous nous sommes concentré sur le rôle du Médiateur, un complexe multiprotéique qui joue un rôle essentiel dans ce processus. Nous avons découvert une interaction génétique entre MED31 qui code pour la sous-unité la mieux conservée du Médiateur et DST1 qui code le facteur d'élongation TFIIS. De façon surprenante, notre étude a révélé un nouveau rôle pour TFIIS qui intervient au cours de l'initiation de la transcription en conjonction avec le Médiateur pour recruter Pol II sur la région promotrice des gènes. Ensuite, nous avons identifié une interaction directe entre Med11 et Rad3, des sous-unités de la tête du Médiateur et de TFIIH. Nous avons alors poursuivi notre étude sur cette interaction et sur celles entre Med11 et ses partenaires au sein du complexe: Med17 et Med22. Des mutants qui affectent ces interactions présentent des défauts de recrutement de TFIIH, de TFIIE et de Pol II ou déstabilisent l'association des modules de TFIIH. Nous avons également mis à jour le rôle du Médiateur sur la phosphorylation du CTD de Pol II via la stabilisation de TFIIK sur la région promotrice. L'ensemble de nos résultats révèle un rôle essentiel du Médiateur dans les recrutements des facteurs généraux au cours de l'initiation et suggère un assemblage branché pendant la formation du PI

    Etude des interactions entre le médiateur et les facteurs généraux de la transcription par l ARN polymérase II

    No full text
    L ARN polymérase II (Pol II) sert à la transcription de petits ARN non codants et à la transcription des ARN messagers. La première étape de l'activation de la transcription est la reconnaissance de régions de l'ADN par les activateurs spécifiques. Ceci permet le recrutement de coactivateurs, des facteurs généraux (GTF) et de Pol II pour former le complexe de préinitiation (PIC). Au cours de ma thèse, nous nous sommes concentré sur le rôle du Médiateur, un complexe multiprotéique qui joue un rôle essentiel dans ce processus. Nous avons découvert une interaction génétique entre MED31 qui code pour la sous-unité la mieux conservée du Médiateur et DST1 qui code le facteur d'élongation TFIIS. De façon surprenante, notre étude a révélé un nouveau rôle pour TFIIS qui intervient au cours de l'initiation de la transcription en conjonction avec le Médiateur pour recruter Pol II sur la région promotrice des gènes. Ensuite, nous avons identifié une interaction directe entre Med11 et Rad3, des sous-unités de la tête du Médiateur et de TFIIH. Nous avons alors poursuivi notre étude sur cette interaction et sur celles entre Med11 et ses partenaires au sein du complexe: Med17 et Med22. Des mutants qui affectent ces interactions présentent des défauts de recrutement de TFIIH, de TFIIE et de Pol II ou déstabilisent l'association des modules de TFIIH. Nous avons également mis à jour le rôle du Médiateur sur la phosphorylation du CTD de Pol II via la stabilisation de TFIIK sur la région promotrice. L'ensemble de nos résultats révèle un rôle essentiel du Médiateur dans les recrutements des facteurs généraux au cours de l'initiation et suggère un assemblage branché pendant la formation du PIC.In eukaryotes, RNA polymerase II (Pol II) is responsible for the transcription of coding genes and a large number of non-coding RNAs. The first step in transcription activation is the recognition of DNA motifs by activators that trigger the recruitment of co-activators, general transcription factors (GTFs) and ultimately Pol II to form a preinitiation complex (PIC). During my thesis, we focused our study on Mediator, a multiprotein complex that plays a critical role in this process. We found different interactions between the Mediator and transcription factors that inform how it could influence the transcription initiation. First, we discovered a genetic interaction between MED31 that encodes the most conserved Mediator subunit and DST1 that encode the elongation factor TFIIS. Surprisingly, we revealed a new role for TFIIS which acts in conjunction with Mediator during transcription initiation to recruit Pol II on promoter. Then, we identified a direct interaction between Med11 head Mediator subunit and Rad3 TFIIH subunit. We explored the significance of this interaction and those of Med11 with Med17 and Med22 head module subunits and found that impairing these interactions could differentially affect the recruitment of TFIIH, TFIIE and Pol II in PIC or destabilize the association of TFIIH modules. We also found that a med11 mutation that altered promoter occupancy by TFIIK kinase module of TFIIH reduced Pol II CTD serine 5 phosphorylation. We conclude that the Mediator head module plays a critical role in TFIIH, TFIIE and PolII recruitment in PIC. Altogether, these results suggest a branched assembly pathway in PIC formation.ORSAY-PARIS 11-BU Sciences (914712101) / SudocSudocFranceF

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    No full text
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional bio-statistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant diversity in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this paper proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow is intended to be (1) generic, as it is suitable regardless of the data volume (small/big) and regardless of the nature of the variables (continuous/qualitative/mixed), (2) easy to implement, as it is based on few easy-to-use software packages, and (3) robust, through the stability evaluation of the final clusters and through recognized algorithms and implementations. This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    No full text
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional bio-statistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant diversity in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this paper proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow is intended to be (1) generic, as it is suitable regardless of the data volume (small/big) and regardless of the nature of the variables (continuous/qualitative/mixed), (2) easy to implement, as it is based on few easy-to-use software packages, and (3) robust, through the stability evaluation of the final clusters and through recognized algorithms and implementations. This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    Get PDF
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional bio-statistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant diversity in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this paper proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow is intended to be (1) generic, as it is suitable regardless of the data volume (small/big) and regardless of the nature of the variables (continuous/qualitative/mixed), (2) easy to implement, as it is based on few easy-to-use software packages, and (3) robust, through the stability evaluation of the final clusters and through recognized algorithms and implementations. This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    SRF co-factors control the balance between cell proliferation and contractility

    No full text
    International audienceGraphical Abstract Highlights d Integrated ChIP-seq Hi-C analysis identifies over 700 TCF-dependent SRF target genes d Over 60% of TPA-inducible gene transcription is TCF-dependent d TCF-dependent transcription potentiates cell proliferation d TCF/MRTF competition for SRF determines contractility and pro-invasive behavio
    corecore